47 research outputs found

    Brain-informed speech separation (BISS) for enhancement of target speaker in multitalker speech perception

    Full text link
    Hearing-impaired people often struggle to follow the speech stream of an individual talker in noisy environments. Recent studies show that the brain tracks attended speech and that the attended talker can be decoded from neural data on a single-trial level. This raises the possibility of “neuro-steered” hearing devices in which the brain-decoded intention of a hearing-impaired listener is used to enhance the voice of the attended speaker from a speech separation front-end. So far, methods that use this paradigm have focused on optimizing the brain decoding and the acoustic speech separation independently. In this work, we propose a novel framework called brain-informed speech separation (BISS)1 in which the information about the attended speech, as decoded from the subject’s brain, is directly used to perform speech separation in the front-end. We present a deep learning model that uses neural data to extract the clean audio signal that a listener is attending to from a multi-talker speech mixture. We show that the framework can be applied successfully to the decoded output from either invasive intracranial electroencephalography (iEEG) or non-invasive electroencephalography (EEG) recordings from hearing-impaired subjects. It also results in improved speech separation, even in scenes with background noise. The generalization capability of the system renders it a perfect candidate for neuro-steered hearing-assistive devices

    Corticocortical evoked potentials reveal projectors and integrators in human brain networks.

    Get PDF
    The cerebral cortex is composed of subregions whose functional specialization is largely determined by their incoming and outgoing connections with each other. In the present study, we asked which cortical regions can exert the greatest influence over other regions and the cortical network as a whole. Previous research on this question has relied on coarse anatomy (mapping large fiber pathways) or functional connectivity (mapping inter-regional statistical dependencies in ongoing activity). Here we combined direct electrical stimulation with recordings from the cortical surface to provide a novel insight into directed, inter- regional influence within the cerebral cortex of awake humans. These networks of directed interaction were reproducible across strength thresholds and across subjects. Directed network properties included (1) a decrease in the reciprocity of connections with distance; (2) major projector nodes (sources of influence) were found in peri-Rolandic cortex and posterior, basal and polar regions of the temporal lobe; and (3) major receiver nodes (receivers of influence) were found in anterolateral frontal, superior parietal, and superior temporal regions. Connectivity maps derived from electrical stimulation and from resting electrocorticography (ECoG) correlations showed similar spatial distributions for the same source node. However, higher-level network topology analysis revealed differences between electrical stimulation and ECoG that were partially related to the reciprocity of connections. Together, these findings inform our understanding of large-scale corticocortical influence as well as the interpretation of functional connectivity networks

    Spatiotemporal structure of intracranial electric fields induced by transcranial electric stimulation in humans and nonhuman primates

    Get PDF
    Transcranial electric stimulation (TES) is an emerging technique, developed to non-invasively modulate brain function. However, the spatiotemporal distribution of the intracranial electric fields induced by TES remains poorly understood. In particular, it is unclear how much current actually reaches the brain, and how it distributes across the brain. Lack of this basic information precludes a firm mechanistic understanding of TES effects. In this study we directly measure the spatial and temporal characteristics of the electric field generated by TES using stereotactic EEG (s-EEG) electrode arrays implanted in cebus monkeys and surgical epilepsy patients. We found a small frequency dependent decrease (10%) in magnitudes of TES induced potentials and negligible phase shifts over space. Electric field strengths were strongest in superficial brain regions with maximum values of about 0.5 mV/mm. Our results provide crucial information of the underlying biophysics in TES applications in humans and the optimization and design of TES stimulation protocols. In addition, our findings have broad implications concerning electric field propagation in non-invasive recording techniques such as EEG/MEG

    Deep neural networks effectively model neural adaptation to changing background noise and suggest nonlinear noise filtering methods in auditory cortex

    No full text
    The human auditory system displays a robust capacity to adapt to sudden changes in background noise, allowing for continuous speech comprehension despite changes in background environments. However, despite comprehensive studies characterizing this ability, the computations that underly this process are not well understood. The first step towards understanding a complex system is to propose a suitable model, but the classical and easily interpreted model for the auditory system, the spectro-temporal receptive field (STRF), cannot match the nonlinear neural dynamics involved in noise adaptation. Here, we utilize a deep neural network (DNN) to model neural adaptation to noise, illustrating its effectiveness at reproducing the complex dynamics at the levels of both individual electrodes and the cortical population. By closely inspecting the model's STRF-like computations over time, we find that the model alters both the gain and shape of its receptive field when adapting to a sudden noise change. We show that the DNN model's gain changes allow it to perform adaptive gain control, while the spectro-temporal change creates noise filtering by altering the inhibitory region of the model's receptive field. Further, we find that models of electrodes in nonprimary auditory cortex also exhibit noise filtering changes in their excitatory regions, suggesting differences in noise filtering mechanisms along the cortical hierarchy. These findings demonstrate the capability of deep neural networks to model complex neural adaptation and offer new hypotheses about the computations the auditory cortex performs to enable noise-robust speech perception in real-world, dynamic environments

    Distinct neural encoding of glimpsed and masked speech in multitalker situations

    No full text
    Humans can easily tune in to one talker in a multitalker environment while still picking up bits of background speech; however, it remains unclear how we perceive speech that is masked and to what degree non-target speech is processed. Some models suggest that perception can be achieved through glimpses, which are spectrotemporal regions where a talker has more energy than the background. Other models, however, require the recovery of the masked regions. To clarify this issue, we directly recorded from primary and non-primary auditory cortex (AC) in neurosurgical patients as they attended to one talker in multitalker speech and trained temporal response function models to predict high-gamma neural activity from glimpsed and masked stimulus features. We found that glimpsed speech is encoded at the level of phonetic features for target and non-target talkers, with enhanced encoding of target speech in non-primary AC. In contrast, encoding of masked phonetic features was found only for the target, with a greater response latency and distinct anatomical organization compared to glimpsed phonetic features. These findings suggest separate mechanisms for encoding glimpsed and masked speech and provide neural evidence for the glimpsing model of speech perception. When humans tune in to one talker in a "cocktail party" scenario, what do we do with the non-target speech? This human intracranial study reveals new insights into the distinct mechanisms by which listeners process target and non-target speech in a crowded environment

    Distinct neural encoding of glimpsed and masked speech in multitalker situations.

    No full text
    Humans can easily tune in to one talker in a multitalker environment while still picking up bits of background speech; however, it remains unclear how we perceive speech that is masked and to what degree non-target speech is processed. Some models suggest that perception can be achieved through glimpses, which are spectrotemporal regions where a talker has more energy than the background. Other models, however, require the recovery of the masked regions. To clarify this issue, we directly recorded from primary and non-primary auditory cortex (AC) in neurosurgical patients as they attended to one talker in multitalker speech and trained temporal response function models to predict high-gamma neural activity from glimpsed and masked stimulus features. We found that glimpsed speech is encoded at the level of phonetic features for target and non-target talkers, with enhanced encoding of target speech in non-primary AC. In contrast, encoding of masked phonetic features was found only for the target, with a greater response latency and distinct anatomical organization compared to glimpsed phonetic features. These findings suggest separate mechanisms for encoding glimpsed and masked speech and provide neural evidence for the glimpsing model of speech perception
    corecore